27 research outputs found

    What Makes a Good Plan? An Efficient Planning Approach to Control Diffusion Processes in Networks

    Full text link
    In this paper, we analyze the quality of a large class of simple dynamic resource allocation (DRA) strategies which we name priority planning. Their aim is to control an undesired diffusion process by distributing resources to the contagious nodes of the network according to a predefined priority-order. In our analysis, we reduce the DRA problem to the linear arrangement of the nodes of the network. Under this perspective, we shed light on the role of a fundamental characteristic of this arrangement, the maximum cutwidth, for assessing the quality of any priority planning strategy. Our theoretical analysis validates the role of the maximum cutwidth by deriving bounds for the extinction time of the diffusion process. Finally, using the results of our analysis, we propose a novel and efficient DRA strategy, called Maximum Cutwidth Minimization, that outperforms other competing strategies in our simulations.Comment: 18 pages, 3 figure

    Multivariate Hawkes Processes for Large-scale Inference

    Full text link
    In this paper, we present a framework for fitting multivariate Hawkes processes for large-scale problems both in the number of events in the observed history nn and the number of event types dd (i.e. dimensions). The proposed Low-Rank Hawkes Process (LRHP) framework introduces a low-rank approximation of the kernel matrix that allows to perform the nonparametric learning of the d2d^2 triggering kernels using at most O(ndr2)O(ndr^2) operations, where rr is the rank of the approximation (r≪d,nr \ll d,n). This comes as a major improvement to the existing state-of-the-art inference algorithms that are in O(nd2)O(nd^2). Furthermore, the low-rank approximation allows LRHP to learn representative patterns of interaction between event types, which may be valuable for the analysis of such complex processes in real world datasets. The efficiency and scalability of our approach is illustrated with numerical experiments on simulated as well as real datasets.Comment: 16 pages, 5 figure

    A framework for paired-sample hypothesis testing for high-dimensional data

    Full text link
    The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical recipes have been proposed so far on how this strategy could be extended to multidimensional paired-sample testing. In this work, we put forward the idea that scoring functions can be produced by the decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. Then, the optimal scoring function can be obtained by the pseudomedian of those rules, which we estimate by extending naturally the Hodges-Lehmann estimator. We accordingly propose a framework of a two-step testing procedure. First, we estimate the bisecting hyperplanes for each pair of instances and an aggregated rule derived through the Hodges-Lehmann estimator. The paired samples are scored by this aggregated rule to produce a unidimensional representation. Second, we perform a Wilcoxon signed-rank test on the obtained representation. Our experiments indicate that our approach has substantial performance gains in testing accuracy compared to the traditional multivariate and multiple testing, while at the same time estimates each feature's contribution to the final result.Comment: 35th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). 6 pages, 3 figure

    To tree or not to tree? Assessing the impact of smoothing the decision boundaries

    Full text link
    When analyzing a dataset, it can be useful to assess how smooth the decision boundaries need to be for a model to better fit the data. This paper addresses this question by proposing the quantification of how much should the 'rigid' decision boundaries, produced by an algorithm that naturally finds such solutions, be relaxed to obtain a performance improvement. The approach we propose starts with the rigid decision boundaries of a seed Decision Tree (seed DT), which is used to initialize a Neural DT (NDT). The initial boundaries are challenged by relaxing them progressively through training the NDT. During this process, we measure the NDT's performance and decision agreement to its seed DT. We show how these two measures can help the user in figuring out how expressive his model should be, before exploring it further via model selection. The validity of our approach is demonstrated with experiments on simulated and benchmark datasets.Comment: 12 pages, 3 figures, 3 tables. arXiv admin note: text overlap with arXiv:2006.1145
    corecore